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Completion of tomato genome sequencing project has broad impacts on genetic and genomic studies of 
tomato and Solanaceae plants. The reference genome sequence derived from Solanum lycopersicum cv 
'Heinz 1706' serves as the firm basis for sequencing-based approaches to tomato genomics. In this article, 
we first present a brief summary of the genome sequencing project and a summary of the reference genome 
sequence. We then focus on recent progress in transcriptome sequencing and small RNA sequencing and 
show how the reference genome sequence makes these analyses more comprehensive than before. We discuss 
the potential of in-depth analysis that is based on DNA methylome sequencing and transcription start-site 
detection. Finally, we describe the current status of efforts to resequence S. lycopersicum cultivars to 
demonstrate how resequencing can allow the use of intraspecific genomic diversity for detailed phenotyping 
and breeding. 
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Introduction 

The tomato (Solanum lycopersicum) is regarded as a model 
plant that represents the Solanaceae family, which com- 
prises 1000-2000 species that grow in all habitats from 
rainforests to deserts (Knapp 2002). Additionally, tomato is 
regarded as a model plant for the study of fruit development 
(Giovannoni 2004). Many Solanaceae plants — including 
potato, pepper, eggplant, tobacco and petunia — have highly 
syntenic genomes that each comprises 12 chromosomes; 
therefore, the reference genome sequence of the tomato was 
long awaited for molecular breeding of Solanaceae crops 
that are important for human nutrition. 

The International Solanaceae Project (SOL, http:// 
solgenomics.net/solanaceae-project/index.pl) launched the 
tomato genome sequencing project in November 2003 
(Mueller et al. 2005). The aim of this sequencing project was 
to provide an information basis that could be used to link 
traits of Solanaceae plants to DNA sequence. This genome 
information is expected to lead us to the deeper understand- 
ing of plant diversity generated from a common set of genes. 
After an intensive collaboration of plant scientists from 14 
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countries, this sequencing project has been completed, and 
an annotated reference sequence and all findings were pub- 
lished in May 2012 (The Tomato Genome Consortium 
2012). The published sequence is highly accurate, hence 
serves as a reliable basis for the further genomic studies. 

With the prevalence of next generation sequencing (NGS) 
technology, the tomato genome sequence will facilitate a 
wide range of genetic and genomic studies that are based on 
comparative and in-depth sequence analysis. For example, 
resequencing of S. lycopersicum varieties sets the stage for 
linking phenotypic variation to DNA sequence variation; 
morphological and metabolic phenotypes of many economi- 
cally important tomato cultivars — including S. lycopersicum 
varieties — have been intensively investigated; findings from 
such studies can be meaningfully reevaluated in the context 
of high-resolution sequence data. Another study is compre- 
hensive sequencing of tomato transcripts. Owing to the ver- 
satility of the NGS technology, transcriptome analysis goes 
far beyond conventional gene-expression profiling, and fa- 
cilitates comprehensive detection of small interfering RNA 
(siRNA), non-coding RNA (ncRNA) and splicing variants. 
Transcriptome analyses using NGS technology have led to 
the characterization of previously unrecognized mechanisms 
of gene regulation. 

This review aims to summarize recent advances in 
sequencing-based genomics research on tomato in four parts. 
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First, we briefly describe an overview of the tomato genome 
sequencing project. Second, we present an overview of 
'Heinz 1706' reference genome sequence. Third, we sum- 
marize recent progress with various types of transcriptome 
analysis. We also discuss the possibilities for further func- 
tional analysis that is based on DNA-methylation and tran- 
scription start-site analysis. In additional section, we present 
genome resequencing projects that involve S. lycopersicum 
cultivars and wild relatives of domestic tomatoes. 

Tomato genome sequencing: transition from Sanger 
sequencing era to NGS era 

Within the history of genome sequencing, the tomato ge- 
nome project occurred during the transition period between 
multi-parallel Sanger sequencing and NGS. In 2004, the 
project was originally launched by the SOL as consortium 
sequencing project, and it involved 10 countries (Korea, 
China, UK, India, The Netherlands, France, Japan, Spain, 
Italy and USA). The 'Heinz 1706' cultivar, which was pro- 
vided by the Heinz Corporation (Pittsburgh, PA), was used 
for this sequencing project because the original Hindlll 
BAC library was made using this cultivar. This sequencing 
project initially involved a BAC-by-BAC sequencing ap- 
proach that had been successfully applied to precedent mod- 
el plants such as Arabidopsis thaliana (Arabidopsis Genome 
Initiative 2000), Oryza sativa (International Rice Genome 
Sequencing Project 2005) and Lotus japonicus (Sato et al. 
2008). Three BAC libraries— specifically an EcoSl, a Mbol 
and a Hindlll library — were constructed. In this approach, a 
limited number of BAC clones were anchored to the genome 
(Peters et al. 2006). To anchor BAC clones, individual 
clones were screened for the presence of molecular genetic 
markers and marker-positive BACs were linked to the re- 
spective genetic loci defined by the respective markers. A 
fluorescent in situ hybridization (FISH) approach was used 
to verify the chromosome map positions of individual BACs 
and to delineate euchromatin/heterochromatin boundaries 
(Peterson et al. 1999). The 12 chromosomes were split up 
between 10 participant countries for BAC-by-BAC sequenc- 
ing. Concurrently, Argentina and Italy sequenced the mito- 
chondria (http://www.mitochondrialgenome.org/) and 
chloroplast genomes (NCBI accession number: NC_007898) 
(Kahlau et al. 2006), respectively, although mitochondria 
genome sequence has not been completely finished yet. This 
BAC-by-BAC approach was used to sequence 263 Mb that 
include 36% of the previously registered tomato ESTs. 

In 2008, the sequencing consortium adopted the Selected 
BAC Mixture (SBM) approach to accelerate progress (The 
Tomato Genome Consortium 2012). Based on sequences 
from the ends of BACs and the criteria that at least one such 
end did not have similarity to repetitive sequence, 30,800 
BAC clones were selected. These selected BACs were 
pooled and sequenced using a Sanger-based shotgun ap- 
proach; 3.1 Gb was sequenced via the SBM method and 
these 3.1 Gb covered 540 Mb of the genome. The SBM con- 



tigs were merged with the BAC-by-BAC contigs; together, 
these contigs cover 8 1 % of the previously registered tomato 
ESTs (http://www.kazusa.or.jp/tomato/). The success of the 
shotgun approach prepared the way for a NGS approach. 

In 2009, the shotgun approach was applied to the whole 
tomato genome using emerging NGS platforms. Three NGS 
platforms — 454 (Margulies et al. 2005), SOLiD (McKernan 
et al. 2009) and Illumina (Harris et al. 2008) — were used to 
generate 21Gb, 64 Gb and 82 Gb, respectively, in NGS 
reads (Ahmadian et al. 2006, Ju et al. 2006). A de novo 
assembly of the 'Heinz 1706' genome was initially based on 
454 and Sanger reads. High-quality BAC end sequences and 
high-coverage Illumina and SOLiD datasets were used to fill 
gaps and to improve overall base accuracy. The resulting 
tomato genome consisted of 91 scaffolds that covered 
760 Mb that were in turn aligned with the 12 chromosomes. 
A combination of Sanger and NGS technologies was used to 
achieve high base accuracy, with only one error per 29.4 kb 
and only one indel error per 6.4 kb. 

As described here, the change in sequencing approach 
over the course of the tomato genome sequencing project 
coincided with the transition from Sanger sequencing tech- 
nology to NGS technologies. In the initial stage of the 
project, the goal was to sequence the euchromatic regions 
(size estimate 220 Mb), and sequencing the euchromatic 
regions was thought to be less than twice the effort of se- 
quencing the Arabidopsis genome (150 Mb) and a moderate 
goal for BAC-by-BAC Sanger sequencing. But ultimately, 
the whole genome (760 Mb) was sequenced essentially via a 
shotgun approach that depended on NGS technology; this 
achievement demonstrated that genome size is not a limiting 
factor. During later stages of the project, advances in bio- 
informatics greatly facilitated the mapping and assembly of 
the relatively short, but highly redundant, reads that were 
generated with the NGS platforms. Projects that follow pub- 
lication of this highly accurate reference genome involve in- 
depth sequencing of RNAs. 

Overview of the tomato genome 

Before describing any post-genome-sequencing studies, let 
us have a quick overview of the 'Heinz 1706' reference ge- 
nome sequence (The Tomato Genome Consortium 2012). 
Each of the 12 chromosomes consists of pericentric hetero- 
chromatin and of euchromatin at the distal ends. The re- 
combination rates and the gene and transcript densities are 
higher in euchromatin than in heterochromatin. Based on 
ITAG Release 2.3 (http://solgenomics.net/organism/Solanum 
_lycopersicum/genome), there are 34,727 of these predicted 
genes in the reference genome; based on RNA sequencing 
data 30,855 genes out of these predicted genes correspond to 
transcribed genes. The genome is highly syntenic with other 
commercially important Solanaceae plants such as potato, 
eggplant, pepper and tobacco. 

Comparison of the reference tomato genome with those 
of plants in the euasterids (Mimulus, Lactuca and 
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Table 1. Publicly available RNA-Seq and sRNA-Seq datasets from tomato (September, 2012) 



Submission ID or 
Accession ID 


NGS nlatfnrm 

- 'VI.J LI 1 CIV I Willi 


Stratpov 


iJCll.ll.Ul.V_3 




SRA049915 


Illumina 
HiSeq2000 


RNA-Seq 


S. lycopersicum cv 'Heinz 1706': 1 cm fruit, 2 cm fruit, 3 cm 
fruit, MG a , B b , B10 c , bud, flower, leaf, root. 
S. pimpinellifolium: IMG 1 , B b , B5 C , leaf 


The Tomato Genome 
Consortium (2012) 


SRA047925 


454 GS FLX 
Titanium 


RNA-Seq 


S. lycopersicum cv 'MoneyMaker': root, stem, leaf, flower, 
MG a , B b , R e 

S. pimpinellifolium: leaf, R e 


The Tomato Genome 
Consortium (2012) 


SRA050797 


AB SOLiD 
System 3 .0 


RNA-Seq 


S. lycopersicum cv 'Heinz 1706': young leaves, old leaves, 
roots, stems, flowers, fruits 


The Tomato Genome 
Consortium (2012) 


SRA027382 


454 GS FLX 


RNA-Seq 


S. lycopersicum cv 'Ailsa Craig': 1 cm fruit, MG a , B\ B7 C , 
B7 c (r/«), BT(nor), B7 c (7Vr), BT(hpl), BT(apricot), 
B7 C (TAGL 1 -RNAi) 

S. lycopersicum cv 'M82': pollen, unpollinated style, pollinated 
style 

S. lycopersicum M82 x M82: pollinated style 

S. pennellii: pollen 

S. pennellii LA716: pollen 

S. pennellii introgression line IL2-2: B7 

S. pennellii LA716 x LA716: pollinated style 

S. pennellii LA716 x M82: pollinated style 

S. pennellii M82 x LA716: pollinated style 


Lopez-Casado et al. 
(2012) 


GSE12081 


454 f 


Small 
RNA-Seq 


S. lycopersicum cv 'Micro-Tom': leaf, 1-1 5mm green fruits 


Moxon et al. (2008) 


GSE18110 


Illumina 

Genome 
Analyzer 


Small 
RNA-Seq 


S. lycopersicum cv 'Micro-Tom': bud, flower, 1-3 mm fruit, 
5-7 mm fruit, 1 1-14 mm fruit, MG a , B b , B3 C , B5 C , B7 C 


Mohorianue/ al. (2011) 


GSE32470 


Illumina 
Genome 
Analyzer II 


Small 
RNA-Seq 


S.lycopersicum cv 'Heinz 1706': leaves, flowers, fruit 


The Tomato Genome 
Consortium (2012) 



a MG, mature green fuit; b B, breaker fruit; c Bn, breaker +n day fruit; d IMG, immature green fruit; e R, ripe fruit. f Model was not identified in the 
record. 



Helianthus) or rosid ( Vitis and Arabidopsis) family revealed 
that two consecutive genome triplication events occurred, 
the first when the rosid and euasterid lineages diverged 
approximately 130 million years ago and the next when the 
euasterid I and euasterid II lineages diverged approximately 
60 million years ago. These two genome triplication events 
set the stage for evolution of genes involved in fleshy fruit 
development; duplicated genes acquired new and distinct 
functions. This group of genes includes transcription factors 
(RIN (Vrebalov et al. 2002), CNR (Manning et al. 2006)), 
enzymes necessary for ethylene biosynthesis and signaling 
(ACS (Nakatsuka et al. 1998), ETR (Klee and Giovannoni 
2011)), red-light photoreceptors that are associated with 
fruit quality (PHYB1, PHYB2 (Pratt et al. 1995)), and en- 
zymes necessary for lycopene biosynthesis (PSY1, PSY2 
(Giorio et al. 2008)). Conversely, cytochrome P450 gene 
subfamilies that are involved in biosynthesis of toxic glyco- 
alkaloid show contraction or complete loss in tomato. 



Transcriptome analyses 

Gene expression profiling 

The reference sequence of the tomato genome has paved 
a fast lane for transcriptome analysis. NGS technology is 
also used for transcriptome sequencing. A comprehensive 
way to measure transcriptome composition is by direct high- 
throughput sequencing of cDNA, or, namely, by RNA-Seq 
(Nagalakshmi et al. 2008). If enough reads are collected 
from a sample, normalized read counts can be used to esti- 
mate gene expression level (Mortazavi et al. 2008). We have 
listed the publicly available RNA-Seq datasets and small 
RNA (sRNA)-Seq datasets in Table 1. 

NGS platforms were used to examine the tissue-specific 
expression profiles of many tomato genes. Specifically, 10 
tissues from the 'Heinz 1706' cultivar — root, leaf, bud, 
flower, 1 cm fruit, 2 cm fruit, 3 cm fruit, mature green fruit, 
breaker fruit and breaker +10 day fruit (red fruit) — were 
subjected to RNA-Seq analysis using the Illumina platform 
(Table 1). The average number of reads per replicate sample 
was 10 ± 1.6 million. Similarly, gene expression profiles 
of S. pimpinellifolium tissues — including leaf, immature 
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green fruit, breaker fruit, breaker + 5 day fruit — were also 
subjected to RNA-Seq analysis using the Illumina platform 
(Table 1). The 454 platform was also used to examine tissue- 
specific gene expression profiles; RNA from seven tissues 
of S. lycopersicum cv 'Money Maker' — root, stem, leaf, 
flower, mature green fruit, breaker fruit and ripe fruit, and 2 
tissues of S. pimpinellifolium; leaf and ripe fruit — were used 
in these analyses (Table 1). 

Mapping of the RNA-Seq data onto the reference genome 
sequence demonstrates that some transcripts originate in 
genomic regions that do not contain protein-coding genes; 
these transcripts may include non-coding RNAs and may 
function in the regulation of RNA accumulation via a pro- 
tein-independent mechanism. 

In the 'omics' framework, transcriptome sequencing can 
provide firm support for protein identification in proteomics 
analysis. Proteomic profiling that derives from accurate 
mass spectrometry depends heavily on the availability of a 
DNA reference database. Thus, the capacity for protein 
identification is limited in non-model organisms due to a 
lack of high-quality reference databases. However, RNA se- 
quencing on NGS platforms may be useful for generating 
reliable reference databases at low cost and such databases 
should facilitate efficient matching of peptides masses to 
corresponding gene sequences. 

This concept was tested by comparing the efficiency of 
protein identification using a custom RNA-Seq-based tran- 
script database versus using a public database of Sol 
Genomics Network (http://solgenomics.net/) tomato uni- 
gene build (version released in June 2009) (Lopez-Casado et 
al. 2012). To construct custom unigene databases, RNA-Seq 
data from 454 sequencing of mature pollen, style, leaf and 
fruit of S. lycopersicum cv 'M82' and two wild relatives 
(S. pennellii and S. habrochaites) was used to assemble 
transcript sequences (Lopez-Casado et al. 2012). For com- 
parison, another set of tomato unigene database, the version 
released in June 2009 from the SOL genomics network 
(SGN), was used. Quantitative proteomic analysis of pollen 
samples was conducted using the isobaric tag for relative 
and absolute quantitation (iTRAQ) method (Wiese et al. 
2007); in the iTRAQ method, quantitative information is 
represented by isotope-encoded 'reporter ions' that are ob- 
served only in MS/MS spectra (Ross et al. 2004). To evalu- 
ate the potential of a custom RNA-Seq database for protein 
identification, peptide searches were performed using the 
custom RNA-Seq database or the SGN unigene database, 
and numbers of proteins that were identified with each data- 
base were compared. The results demonstrated that the num- 
ber of proteins identified with the custom RNA-Seq data- 
base was greater than that with the SGN unigene database, 
yet the percentages of identified mass spectra were similar. 
More importantly, the percentages of matched amino acids 
in a peptide were comparable using the two databases. These 
results indicate that a custom RNA-Seq database can be used 
as a reliable reference database for proteomics analysis and, 
therefore, valuable for proteomics of non-model plants. 



Small RNA profiling 

The availability of a reference genome sequence has 
expanded the field of exploration of small RNA function to 
tomato. Recent studies established that 21-24 nt small 
RNAs (sRNA), which are generated from double stranded 
RNA (dsRNA) by Dicer-like (DCL) family nuclease, are in- 
volved in the control of gene expression (Phillips et al. 

2007) . These dsRNA can be formed by two different mech- 
anisms; specifically, micro RNA (miRNA) is generated 
from a precursor RNA that has a short hairpin structure, 
whereas small interfering RNA (siRNA) is produced from 
long dsRNA, formation of which is dependent on the activi- 
ty of RNA-dependent RNA polymerase (Brodersen and 
Voinnet 2006). 

Before the tomato genome was completed, Sanger se- 
quencing was used to profile tomato sRNA. The first com- 
prehensive sRNA profiling was reported in 2007 (Pilcher et 
al. 2007); 4,108 sRNA were cloned from mature green fruit 
and nine known and three novel miRNAs were identified. 
None of these 12 miRNAs had homology to Arabidopsis 
miRNAs; this finding indicated that the 12 miRNAs each 
have a species-specific role in tomato. Itaya and coworkers 
(Itaya et al. 2008) and Zhang and coworkers (Zhang et al. 

2008) reported similar results. Deep sequencing using NGS 
platform was then reported in 2008 (Moxon et al. 2008). Se- 
quencing libraries were produced from leaf, bud and green 
fruit (1 to 15 mm diameter) of the dwarf cultivar 'Micro- 
Tom' (Meissner et al. 1997). This group used the 454 plat- 
form to generate 721,874 reads that yielded 225,000 and 
102,000 unique sequences from fruits and leaves, respec- 
tively (Table 1). From these reads, 20 sequences matched 
known miRNAs (miR156, miR159, miR160, miR162, 
miR164, miR165, miR165, miR166, miR167, miR168, 
miR169, miR170, miR171, miR172, miR319, miR390, 
miR393, miR396, miR399, miR894); when 2-nt mismatches 
were allowed, 10 additional sequences match known 
miRNAs (miR394, miR395, miR397, miR398, miR408, 
miR472, miR482, miR828, miR858, miR1151). Tissue- 
dependent expression levels were examined on northern 
blots. Interestingly, the expression of some individual 
miRNAs differed at different stages of fruit development. 
For example, the accumulation of miR390, which may 
regulate genes that encode receptor-like kinases, was much 
higher in very small fruits than in leaves or flower buds, but 
miR390 accumulation was very low in mature fruits. This 
finding indicates that miR390 has a specific role in early 
fruit formation. The size distribution of non-redundant 
sRNA had a peak at 21 nt in leaf, but in fruits, there were 
more 23 or 24 nt sRNAs than 21 nt sRNAs. The 23- or 24-nt 
sRNAs are thought to be generated via a RNA polymerase 
IV-dependent pathway that produces heterochromatin- 
related siRNA (Onodera et al. 2005). Thus, this result sug- 
gested a more extensive control at the transcriptional level 
by DNA methylation triggered by 23- or 24-nt sRNAs in 
fruit tissues than in leaf. 

More detailed sRNA profiling, including profiling of 
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10 time points from closed buds to red-ripe fruit of the 
'Micro-Tom' cultivar, was reported in 2011 (Mohorianu et 
al. 2011); this work was based on the preliminary (-43% 
complete) 'Heinz 1706' genome (Table 1). Preliminary ge- 
nome sequence facilitated the profiling of not just miRNAs, 
but also of many other sRNAs. When sRNA reads were 
mapped to the preliminary genome, 43,336 sRNA- 
producing loci were identified. Analysis of the sRNA ex- 
pression profiles revealed that 24-nt sRNAs predominate in 
the flowering stages, but that representation of 21-nt forms 
increases in the late stages of fruit development. This result 
clearly demonstrates that sRNA expression is not random 
but is timed to coincide with the stages of fleshy fruit devel- 
opment. Most of the sRNAsthat did not match to known 
miRNA were differentially expressed during fruit develop- 
ment. Expression profiles of 43,336 sRNAs were classified 
into 63 co-expression clusters with respect to the similarity 
in the developmental expression pattern. One of the intrigu- 
ing findings is that many clusters showed dominance of a 
single sRNA length. For example, two clusters that had sim- 
ilar expression profiles both of which show a remarkable 
drop with the onset of fruit development (namely, Cluster C 
consisted of 41 sRNAs and Cluster D consisted of 13 
sRNAs), differed in size-class composition; with a clear 
dominance of 24-nt class in Cluster C and a dominance of 
22-nt class in Cluster D. This suggests that different sRNA 
biogenesis mechanisms are specifically and independently 
regulated throughout fruit developmental process. 

With the completion of the 'Heinz 1706' genome, map- 
ping of NGS reads to the reference genome revealed the 
presence of 96 conserved miRNA genes in tomato (The 
Tomato Genome Consortium 2012). Among the 34 miRNA 
families identified, 10 are highly conserved in plants. Inter- 
estingly, the sRNAs specifically mapped to short regions, 
typically a 100-200 bp region within a promoter that pro- 
duces a significant amount of sRNAs. The other interesting 
feature of sRNAs that map to the promoters of protein- 
coding genes is the dynamic expression profile during fruit 
development (The Tomato Genome Consortium 2012). No- 
tably, the majority of these sRNAs that map to the promoters 
of protein-coding genes are 24-nt RNAs, and such RNAs are 
known to mediate methylation or de-methylation of DNA. 
Therefore, the sRNA that map to promoters may control 
gene expression at the transcriptional level. The biogenesis, 
regulation and function of these sRNAs that map to the pro- 
moters remain to be elucidated. 

It will be intriguing to combine sRNA sequencing with 
other types of sequencing, such as DNA methylome se- 
quencing (Lister et al. 2008, Zhang et al. 2006) or CAP 
analysis of gene expression (CAGE), which captures se- 
quences containing transcription start sites (Shiraki et al. 
2003). Reportedly, region-specific accumulation of sRNA 
and hypermethylation of cytosines, which are both associat- 
ed with DNA methylation-mediated gene regulation, corre- 
late with suppression of corresponding target genes in Fl 
progeny of S. lycopersicum cv 'M82' xS. pennellii LA716 



(Shivaprasad et al. 2012). Additionally, some recent find- 
ings indicate that DNA methylation is associated with gene 
expression regulation during tomato fruit development 
(Teyssier et al. 2008). 

Interspecific and intraspecific comparison of tomato 
genome sequences 

The tomato reference genome was derived from a cultivar of 
S. lycopersicum, designated 'Heinz 1706'. According to the 
conserved nature of Solanacea genomes, availability of the 
reference genome is facilitating the sequencing of varieties 
belonging to S. lycopersicum and wild relatives. 

A comparison of the 'Heinz 1706' reference genome with 
the genome of S. pimpinellifolium (accession LA 15 89), 
which is thought to be a wild ancestor of S. lycopersicum, 
has been reported (The Tomato Genome Consortium 2012). 
Based on the de novo assembly of S. pimpinellifolium ge- 
nome sequence, the divergence between the two genomes 
was estimated to be 0.6 %, or 5.4 million single nucleotide 
polymorphisms (SNPs). As expected from the pedigree of 
'Heinz 1706', which has S. pimpinellifolium as one of its 
ancestors, putative S. pimpinellifolium introgressions were 
detected. Genomic regions with low divergence between 
S. pimpinellifolium and 'Heinz 1706' but with high di- 
vergence within domesticated cultivars were regarded as 
S. pimpinellifolium introgression. Based on these criteria, 
40 regions that were considered to be introgressed from 
S. pimpinellifolium were detected. Interestingly, there ap- 
pear to be large introgressions on chromosome 9 and 1 1 and 
each introgression is implicated in the breeding of disease 
resistance loci into 'Heinz 1706' using S. pimpinellifolium 
germplasm. 

Genome project of domesticated cultivars includes 
'Micro-Tom', a dwarf cultivar that is regarded as one of the 
model systems in studies of tomato (Meissner et al. 1997). 
Systematic bioresources of 'Micro-Tom' including EMS- 
mutagenized lines, gamma ray-mutagenized lines and full- 
length cDNAs have been developed and are publicly available 
(Aoki et al. 2010, Saito et al. 2011); these resources makes 
'Micro-Tom' a good system for tomato genomics. Reported- 
ly, 'Micro-Tom' has relatively large number of loci that are 
polymorphic when compared the respective loci in other cul- 
tivated tomatoes (Shirasawa et al. 2010). We conducted ge- 
nome sequencing of 'Micro-Tom' (accession: DRX000482, 
DRX000454, DRX000455, DRX000627 and DRX000628) 
and identified approximately 1,230,000 SNPs and 190,000 
indels in comparison of the "Micro-Tom" sequence and the 
'Heinz 1706' reference genome sequence (unpublished 
data). This means that there is one nucleotide difference be- 
tween the two genomes in every 700 bases. This frequency 
appears to be higher than that observed in intra-specific 
comparison of rice cultivars, where one SNP was identified 
in every 2,890 bases (Arai-Kichise et al. 201 1). This result is 
consistent with of the finding that there are many polymor- 
phic loci in 'Micro-Tom' (Shirasawa et al. 2010). 
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A comprehensive clade-oriented genome sequencing 
project is ongoing as a collective effort of the Solanaceae 
research community; this collaboration is called the SOL- 
100 project (http://solgenomics.net/organism/sollOO/view). 
In the SOL-100 framework, 17 genome sequencing projects 
of S. lycopersicum cultivars are currently registered 
(September 2012), including 'Ailsa Craig', 'Rutgers', 'M82' 
and 'Micro-Tom' which are popular cultivars in tomato 
experimental studies (http://solgenomics.net/organism/l/ 
view). Although most these datasets are not currently public- 
ly available, they will serve as excellent information re- 
sources for developing SNP markers and intra-specific maps 
(Saliba-Colombani et al. 2000, Shirasawa et al. 2010). 

Conclusion 

Highly accurate 'Heinz 1706' reference genome sequence 
paves the road for sequencing-based functional genomics of 
tomato and of its wild relatives. In this review, we presented 
transcriptome analyses as one field that will benefit from 
this reference genome. The mapping of NGS reads onto the 
reference genome facilitates quantitative estimation of the 
expression levels of any transcripts — including those de- 
rived from annotated genes and non-annotated transcription 
units such as ncRNAs. Additionally, sRNA sequencing may 
accelerate the discovery of novel mechanisms of transcrip- 
tional and post-transcriptional regulation of tomato genes 
during fruit development. 

We also described the sequencing of genomes of 
cultivated tomatoes and wild relatives of tomatoes. Com- 
bining detailed phenotyping of cultivated tomatoes (for 
example, http://www.phenome-networks.com/home; http:// 
solgenomics.net/search/phenotypes) with genome sequenc- 
ing facilitates association of DNA sequences to agronomi- 
cally important traits. Systematic development of genomics 
bioresources (Ariizumi et al. 2011, Bombarely et al. 2011, 
Carvalho et al. 201 1) also helps us exploit the wealth of the 
Solanaceae genome sequence. 
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